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Abstract. Genetic investigations often involve the testing of vast num- 
bers of related hypotheses simultaneously. To control the overall error 
rate, a substantial penalty is required, making it difficult to detect 
signals of moderate strength. To improve the power in this setting, a 
number of authors have considered using weighted p-values, with the 
motivation often based upon the scientific plausibility of the hypothe- 
ses. We review this literature, derive optimal weights and show that 
the power is remarkably robust to misspecification of these weights. We 
consider two methods for choosing weights in practice. The first, ex- 
ternal weighting, is based on prior information. The second, estimated 
weighting, uses the data to choose weights. 

Key words and phrases: Bonferroni correction, multiple testing, weighted 
p- values. 



1. INTRODUCTION 

Testing for association between genetic variation 
and a complex disease typically requires scanning 
hundreds of thousands of genetic polymorphisms. 
In a multiple testing situation, such as a genome- 
wide association study (GWAS), the null hypothe- 
sis is rejected for any test that achieves a p-value 
less than a predetermined threshold (usually on the 
order of 10 -8 ). Data from these investigations has 
renewed interest in the multiple testing problem. 
The introduction of the false discovery rate and a 
procedure to control it by Benjamini and Hochberg 
(1995) inspired hope that this would be an effective 
way to control error while increasing power (Storey 



Kathryn Roeder is Professor of Statistics, Department 
of Statistics, Carnegie Mellon University, 5000 Forbes 
Avenue, Pittsburgh, PA 15213, USA 
e-mail: roeder <§ stat.cmu.edu. Larry Wasserman is 
Professor of Statistics, Carnegie Mellon University, 
5000 Forbes Avenue, Pittsburgh, PA 15213, USA 
e-mail: wasserman® stat. emu. edu. 



This is an electronic reprint of the original article 
published by the Institute of Mathematical Statistics in 
Statistical Science, 2009, Vol. 24, No. 4, 398-413. This 
reprint differs from the original in pagination and 
typographic detail. 



and Tibshirani, 2003; Sabatti, Service and Freimer, 
2003). To further bolster power, recent statistical 
methods have been proposed that up-weight and 
down-weight hypotheses, based on prior likelihood 
of association with the phenotype (Genovese, Roeder 
and Wasserman, 2006; Roeder et al., 2006; Roeder, 
Wasserman and Devlin, 2007; Wang, Li and Bucan, 
2007). Such prior information is often available in 
practice. 

Weighted procedures multiply the threshold by 
the weight w, for each test, raising the threshold 
when w > 1 and lowering it if w < 1. To control the 
overall rate of false positives, a budget must be im- 
posed on the weighting scheme, so that the average 
weight is one. If the weights are informative, the 
procedure improves power substantially, but, if the 
weights are uninformative, the loss in power is usu- 
ally small. Surprisingly, aside from this budget re- 
quirement, any set of nonnegative weights is valid 
(Genovese, Roeder and Wasserman, 2006). While 
desirable in some respects, this flexibility makes it 
difficult to select weights for a particular analysis. 

The first such weighting scheme appears to be 
Holm (1979). Related ideas can be found in Ben- 
jamini and Hochberg (1997), Chen et al. (2000), 
Genovese, Roeder and Wasserman (2006), Kropf et al. 
(2004), Rosenthal and Rubin (1983), Schuster, Kropf 
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and Roeder (2004), Westfall and Krishen (2001), 
Westfall, Kropf and Finos (2004), Blanchard and 
Roquain (2008) and Roquain and van de Wiel (2008), 
among others. Several of these approaches use data 
dependent weights and yet maintain familywise er- 
ror control. There are, of course, other ways to im- 
prove power aside from weighting. Some notable re- 
cent approaches include Rubin, Dudoit and van der 
Laan (2006), Storey (2007), Donoho and Jin (2004), 
Signoravitch (2006), Westfall, Krishen and Young 
(1998), Westfall and Soper (2001), Efron (2007) and 
Sun and Cai (2007). Of these, our approach is closest 
to Rubin, Dudoit and van der Laan (2006). 

In some cases, the optimal weights can be esti- 
mated from the data. An approach developed by 
Westfall, Kropf and Finos (2004) utilizes quadratic 
forms to construct such weights; however, this ap- 
proach assumes the individual measurements are nor- 
mally distributed. This approach is suited to appli- 
cations such as microarray data for which the obser- 
vations are approximately normally distributed. We 
are interested in applications such as tests for ge- 
netic association. In this setting the individual ob- 
servations are discrete, but the test statistics are 
approximately normally distributed. 

In general, p-value weighting raises several im- 
portant questions. How much power can we gain if 
we guess well in the weight assignment? How much 
power can we lose if we guess poorly? In this pa- 
per we show that the optimal weights have a sim- 
ple parametric form and we investigate various ap- 
proaches for estimating these weights. We also show 
the power is very robust to misspecification of the 
weights. In particular, in Section 3 we show that 
(i) sparse weights (few large weights and minimum 
weight close to 1) lead to huge power gains for well 
specified weights, but minute power loss for poorly 
specified weights; and (ii) in the nonsparse case, un- 
der weak conditions, the worst case power for poorly 
specified weights is typically better than the power 
obtained using equal weights. 

We consider two methods for choosing the weights: 
(i) external weights, where prior information (based 
on scientific knowledge or prior data) singles out 
specific hypotheses (Section 4) and (ii) estimated 
weights where the data are used to construct weights 
(Section 5). External weights are prone to bias, while 
estimated weights are prone to variability. The two 
robustness properties reduce concerns about bias 
and variance. 



To motivate this work consider an example (Fig- 
ure 1) of external weighting that arises in genetic 
epidemiology. To identify variants of genes that in- 
duce greater susceptability to disease, two types of 
studies (linkage and association) are often performed. 
Whole genome linkage analysis has been conducted 
for most major diseases. These data can be summa- 
rized by a linkage trace, a smooth stochastic process 
{Z(s) : s 6 [0, L]} where each s corresponds to a lo- 
cation on the genome. At points that correspond to 
a variant of a gene of interest, the mean of the pro- 
cess /i(s) = E(Z(s)) is a large positive value; how- 
ever, due to extensive spatial correlation in the pro- 
cess, fi(s) is also nonzero in the vicinity of the vari- 
ant. Tests for association between genetic polymor- 
phisms and disease status for each of many genetic 
markers across the genome are also of interest. Like 
linkage analysis, the association statistics {Tj:j = 
1, . . . , m} map to spatial locations {sj :j = l,..., m} 
on the genome. The number of tests m can be large, 
on the order of 1,000,000. Until recently, whole genome 
association analysis was prohibitively expensive, but 
technological advances have now made such studies 
feasible. Due to the multiple testing correction, it is 
difficult to achieve sufficient power to obtain defini- 
tive results in these studies. The linkage trace pro- 
vides one obvious source of information from which 
the weights can be constructed; see Section 6 for fur- 
ther elaboration. Unlike linkage analysis, however, 
the spatial correlation in association tests is weak. 
For this reason, other choices such as genetic path- 
ways could offer a more promising source for weights 
in the future. 

2. BACKGROUND 
2.1 Multiple Testing 

Consider a multiple testing situation in which m 
tests are being performed. Suppose vtiq of the null 
hypotheses are true and mi = m — tuq null hypothe- 
ses are false. We can categorize the m tests as in 
Table 1. In this notation F is the number of false 
positives. To control the familywise error rate, it is 
traditional to bound P(F > 0) at a. When the tests 
are independent, the simplest way to control this 
probability is to reject only those tests for which 
the p- value is less than a/m; this is called the Bon- 
ferroni procedure. 

In 1995 Benjamini and Hochberg (BH) introduced 
a new approach to multiple hypothesis testing that 
controls the false discovery rate (FDR), defined as 
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Fig. 1. Linkage trace and weights for 6 chromosomes. The trace is the linkage statistic plotted as a function of position on 
the chromosome. The shading indicates which p-values were up/down weighted. The upspike is the association test statistic. 
The 3 downspikes indicate tests that were rejected using the binary weights. 



the expected fraction of false rejections among those 
hypotheses rejected. Let Pm < • • • < P( m ) be the or- 
dered p-values from m hypothesis tests, with P( ) = 
0. Then, the BH procedure rejects any null hypoth- 
esis for which P <T with 

( ai 1 

T = maxjP (i) :P (i) < — j. 

This quantity is of more scientific relevance than the 
overall type I error rate in GWAS. Also, the proce- 
dure is more powerful than the Bonferroni method. 
Adaptive variants of the procedure can increase power 
further at little additional computational expense; 
see Benjamini, Krieger and Yekutieli (2006) and 
Storey (2002). 

BH controls the false discovery rate at level amo/m, 
where tjiq is the number of true null hypotheses. 
With certain dependence assumptions on the p-values, 
this is true regardless of how many nulls are true 



and regardless of the distribution of the p-values un- 
der the alternatives (Benjamini and Yekutieli, 2001; 
Blanchard and Roquain, 2009; Sarkar, 2002). Un- 
der some distributional assumptions, Genovese and 
Wasserman (2002) show that, asymptotically, the 
BH method corresponds to rejecting all p-values less 
than a particular p- value threshold u*. Specifically, 
u* is the solution to the equation H(u) = (3u and 
j3 = (— — Aq)/(1 — Aq), where Aq = mo/m and H is 
the (common) distribution of the p-value under the 



Table 1 

2x2 classification of m hypothesis tests 





Ho rejected 


Ho not rejected 


Total 


Hq true 


F 


mo - F 


mo 


Ho false 


T 


mi — T 


nil 


Total 


8 


m-S 


in 
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alternative. The key result is that a/m < u* < a, 
which shows that the BH method is intermediate 
between Bonferroni (corresponding to a/m) and un- 
corrected testing (corresponding to a). If Aq is close 
to 0, however, as it usually is in GWA, then f3 is a 
very large quantity and the power of the FDR is not 
much improved over the Bonferroni procedure. 

The power of the BH method can be improved 
with adaptations. Blanchard and Roquain (2008) 
have given numerical comparisons of different adap- 
tive procedures under dependence. Romano, Shaikh 
and Wolf (2008) have considered improving the adap- 
tive procedure of Benjamini, Krieger and Yekutieli 
(2006) using the bootstrap. Sarkar and Heller (2008) 
have noted that the adaptive procedure of Benjamini 
et al. may not perform well compared to Storey's 
(2002) procedure for certain parameter choices. 

2.2 Weighted Multiple Testing 

We are given hypotheses H = (Hi, . . . , H m ) and 
standardized test statistics T = (7\, . . . ,T m ), where 
Tj ~ N(£j, 1). Likewise, Tj ~ x\(tf)- For a two-sided 
hypothesis, Hj = 1 if £j and Hj = otherwise. 
For the sake of parsimony, unless otherwise noted, 
results will be stated for a one-sided test where Hj = 
1 if £j > 0, although the results extend easily to the 
two-sided case. Let 6 = (£i, . .. ,£ m ) denote the vec- 
tor of means. 

The p- values associated with the tests are P = 
(Pi, ... , P m ), where Pj = $(2», ¥ = 1 - $ and $ 
denotes the standard Normal cdf. Let Pm < ■ • • < 
P( m ) denote the sorted p-values and let Tn\ >•••> 
Tr m \ denote the sorted test statistics. 

A rejection set TZ is a subset of {1, . . . ,m}. Say 
that TZ controls familywise error at level a if P(TZ n 
Ho) < a i where %q = {j'-Hj = 0}. The Bonferroni 
rejection set is TZ = {j : Pj < a/m} = {j : Tj > z a / m } 

where we use the notation zp = <1? 1 (f3). 

The weighted Bonferroni procedure (Rosenthal and 
Rubin, 1983; Genovese, Roeder and Wasserman, 2006) 
is as follows. Specify nonnegative weights w = 
(wi, . . . , w m ) and reject hypothesis Hj if 

(1) , f/:^<--). 

I w j m ) 

In the following lemma we show that as long as 
m~ z2jWj =w = l, the rejection set 1Z controls 
familywise error at level a. The second lemma in- 
cludes a simple modification that will be needed 
later. 



Lemma 2.1. If w = 1, then TZ controls family- 
wise error at level a. 

Lemma 2.2. Suppose that Wj = g(Vj,c), j = 1, 
. . . , m, for some random variables Vi, . . . , V m , some 
constant c and some function g. Further, suppose 
that Vj has a known distribution H whenever j G Ho 
and that Pj is independent of Vj for all j S Ho- 
The rule that rejects when Pj < aWj/m controls 
familywise error at level a if c is chosen to satisfy 
E H (g(Vj,c))<l. 

Genovese, Roeder and Wasserman (2006) also 
showed that false discovery methods benefit by 
weighting. Recall that the false discovery proportion 
(FDP) is 

number of false rejections |7£n%o| 
number of rejections \TZ\ ' 

where the ratio is defined to be if the denomina- 
tor is 0. The false discovery rate (FDR) is FDR = 
E(FDP). Benjamini and Hochberg (1995) proved 
FDR < a if TZ = {j:P(j) < T} where T = 
max{j : Prj\ < ja/m}. Genovese, Roeder and Wasser- 
man (2006) showed that FDR < a if the Pj's are 
replaced by Qj = Pj / Wj provided W=l. This paper 
focuses on familywise error using the weighted pro- 
cedure (1). Similar results hold for FDR and other 
familywise controlling procedures such as Holm's 
test. 

3. POWER, ROBUSTNESS AND 
OPTIMALITY 

The optimal weights, derived below, can be re- 
expressed as optimal cutoffs for testing. Specifically, 
rejecting when Pj/wj < a/m is the same as rejection 
when Tj > £j/2 + c/£j. This result can be obtained 
from Spj0tvoll (1972) and is identical to the result 
in Rubin, Dudoit and van der Laan (2006) obtained 
independently. The remainder of the paper, which 
shows some good properties of the weighted method, 
can thus also be considered as providing support for 
their method for selecting test specific cutoffs. In 
particular, Rubin et al. (2006) 's simulations indicate 
that even poorly specified estimates of the cutoffs 
£j/2 + c/£,- can still perform well. In this section we 
provide insight into why this is true. 

The power of a single, one-sided alternative in the 
unweighted case (vjj = 1) is 

Trfo, 1) = P(Tj > z a/m ) = $(z a/m - 
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The power 1 in the weighted case is 

aw* 



m 



(2) 



l / awj 



■m 



awj I m 







Weighting increases the power when Wj > 1 and de- 
creases the power when Wj < 1 for the jth alterna- 
tive. 

Given 9 = (fi,...,£ m ) and w = (wi,...,w m ), we 
define the average power 



^ m 

-^7r(^-,u-i)/fe>0), 



where m\ = Yl'jLi-^i^j > 0)- More generally, if £ is 
drawn from a distribution Q and w = w(£) is a weight 
function, we define the average power J 7r(£, 
> 0) dQ(0 / / /(£ > 0) dQ(0- If we take Q 
to be the empirical distribution of (£i, . . . , £ m )j then 
this reduces to the previous expression. In this for- 
mulation we require w(£) > and J w(£) dQ(£) = 1. 

In the following theorem we see that the set of op- 
timal weight functions form a one parameter family 
indexed by a constant c. 

Theorem 3.1. Given = . . . ,£ m ), f/ie opiz- 
roaZ weight vector w = (w±, . . . , u> m ) i/iai maximizes 
the average power subject to Wj > and w = 1 is 
w - 



(Pc(Cl),-- 

(3) p c (0 



•,Pc(Cm.)), where 



2 + -]/(e>0), 



and c = c(#) is defined by the condition 



(4) 



1 



m 



1. 



The proof, essentially a special case of Spj0tvoll 
(1972), is in the Appendix. Figure 2 displays the 
function p c (£) for various values of c (the function is 
normalized to have maximum 1 for easier visualiza- 
tion) . The result generalizes to the case where the al- 
ternative means are random variables with distribu- 
tion Q in which case c is defined by J p c (0 dQ{£) = 
1. 

From (2) and (3) we have immediately: 



For a two-sided alternative the power is 

= $(z alVj/2m - £j) + $ {z aWj /2m + OO- 



LEMMA 3.2. The power at an alternative with 
mean £ under optimal weights is 3>(c/£ — £/2). The 
average power under optimal weights, which we call 
the oracle power, is 



1 



f 0), 



— £ ¥ i 

3= l \ 

The oracle power is not attainable since the opti- 
mal weights depend on = . . . , £ m ). In practice, 
the weights will either be chosen by prior informa- 
tion or by estimating the £'s. This raises the follow- 
ing question: how sensitive is the power to correct 
specification of the weights? Now we show that the 
power is very robust to weight misspecification. 

Property I: Sparse weights (minimum weight close 
to 1) are highly robust. If most weights are less 
than 1 and the minimum weight is close to 1, then 
correct specification (large weights on alternatives) 
leads to large power gains but incorrect specification 
(large weights on nulls) leads to little power loss. 



c = -10 



c = -1 





Fig. 2. Optimal weight function p c (£) for various c. In each 
case m — 1000 and a — 0.05. The functions are normalized to 
have maximum 1. 
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Property II: Worst case analysis. Weighted hypoth- 
esis testing, even with poorly chosen weights, typi- 
cally does as well or better than Bonferroni except 
when the the alternative means are large, in which 
both have high power. 

Let us now make these statements precise. Also, 
see Genovese, Roeder and Wasserman (2006) and 
Roeder et al. (2006) for other results on the effect 
of weight misspecification. 

Property I. Consider first the case where the weights 
take two distinct values and the alternatives have a 
common mean £. Let e denote the fraction of hy- 
potheses given the larger of the two values of the 
weights B. Then, the weight vector w is proportional 
to 




k terms 



where k = em and B > 1, and, hence, the normalized 
weights are 



W = ( wi,. . .,Wi , Wq,. . .,Wq) 

k terms m—k terms 



where 



B 



W\ 



w 



eB + (l-e) J ~ u eB + (l-e)' 

We say that the weights are sparse if e is small. Pro- 
vided B is considerably less than 1/e, most weights 
are near 1 in the sparse case. 

Rather than investigate the average power, we fo- 
cus on a single alternative with mean £. The power 
gain by up-weighting this hypothesis is the power 
under weight wi minus the unweighted power 
7r(£, wi) — 7r(£, 1). Similarly, the power loss for down- 
weighting is 7r(£, 1) — tt(^,wq). The gain minus the 
loss, which we call the robustness function, is 



- 7r(£,to )) 

-0- 



R(B,e) = (7v^, Wl )- 
- 2*(z a/r 

The gain outweighs the loss if and only if R(B, e) > 
(Figure 3). 

In the sparse weighting scenario k is small and 
wo ~ 1 by assumption, consequently, an analysis of 
R(B,e) sheds light on the effect of weighting on 
power, without the added complications involved in 
a full analysis of average power. 



Theorem 3.3. Fix B > 1. Then, lim £ ^ R(B, 
e) > 0. Moreover, there exists e*(B) > such that 
R(B,e)>0for alle<e*(B). 

We can generalize this beyond the two- valued case 
as follows. Let w be any weight vector such that w = 
1. Now define the (worst case) robustness function 

R(0 = r . rnm, ,14,^-^,1)} 

{y.Wj>l,Hj = l} 

{r-Wj<i,Hj=i} 

We will see that i?(£) > under weak conditions and 
that the maximal robustness is obtained for £ near 
the Bonferroni cutoff z a / m . 

Theorem 3.4. A necessary and sufficient con- 
dition for R(£,) > is 



(5) 

- 2$(z a/m - < 0, 
where B = mm{wj : Wj > 1}, b = mm{wj}. More- 



over, 



iV(6 = -A(0 + o(i-&), 



where 



= (*(*a/m - ~ Hz a B/rn ~ £)) > 



=i o 

o 
cr 




50 
B 



100 



Fig. 3. Robustness function for in = 1000. In this example, 
£ = z a / m which has power 1/2 without weighting. The gain of 
correct weighting far outweighs the loss for incorrect weighting 
as long as the fraction of large weights e is small. 
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and, as b ->■ 1, /i({£:-R(£) < 0}) ->■ and 
inf ?J R(£)^0. 

Based on the theorem, we see that there is over- 
whelming robustness as long as the minimum weight 
is near 1. Even in the extreme case 6 = 0, there is 
still a safe zone, an interval of values of £ over which 
i?(£)>0. 

Lemma 3.5. Suppose that B > 2. Then there ex- 
ists £* > such that B.B,b(0 > /or a// < £ < £* 
and a// b. An upper bound on £* is z a / m — l/(z a / m — 

z Ba/m)- 

Property II. Even if the weights are not sparse, the 
power of the weighted test tends to be acceptable. 

The result holds even though the weights them- 
selves can be very sensitive to changes in 6. Consider 
the following example. Suppose that = (£i, . . . , £ m ) 
where each £ is equal to either or some fixed num- 
ber £. The empirical distribution of the £,-'s is thus 
Q = (1 — a)5o + aog, where 5 denotes a point mass 
and a is the fraction of nonzero means. The opti- 
mal weights are for £j = and 1/a for £j = £. Let 
Q = (1 — a — j)5o + j5 u + ao~£, where n is a small pos- 
itive number. Since we have only moved the mass at 
to u, and u is small, we would hope that w{£) 
will not change much. But this is not the case. Set 
£ = A + y/A2 - 2c, u = B- ^B2 - 2c, where 



A = $ 



£ = $ 



-i 



(m(7-K" + a)) 

i / Ka 

(m(^K + a)) 



This arrangement yields weights wq and wi on u and 
£ such that wq/wi = K. For example, if m = 1000, 
a = 0.05, a = 0.1, 7 = 0.1, if = 1000 and c = 0.1, 
then u = 0.03 and £ = 9.8. The optimal weight on £ 
under Q is 10 but under Q it is 0.00999 and so is 
reduced by a factor of 1001. More generally, we have 
the following result which shows that the weights 
are, in a certain sense, a discontinuous function of 
9. 

Lemma 3.6. Fix a and m. For any 5 > and 
s > there exists Q = (1 — a)o~o + aS^ and Q = (1 — 
a — 7)00 + 7<5u + ao~£ swc/i i/iaf d(Q,Q) < 5, and 
/K£)/p(£) < e > w/iere a = a/4, d(Q,Q) = 
supt |Q(— oo,£],Q(— oo,£]| is i/ie Kolmogorov- 
Smnirnov distance, p is the optimal weight function 
for Q and p is the optimal weight function for Q . 



Fortunately, this feature of the weight function 
does not pose a serious hurdle in practice because 
it is possible to have high power even with poor 
weights. In Figure 4 the plots on the left show the 
power as a function of the alternative mean £. The 
dark solid line shows the lowest possible power as- 
suming the weights were estimated as poorly as pos- 
sible (under conditions specified below). The lighter 
solid line is the power of the unweighted (Bonfer- 
roni) method. The dotted line shows the power un- 
der theoretically optimal weights. The worst case 
weighted power is typically close to or larger than 
the Bonferroni power except for large £ when they 
are both large. 

To begin formal analysis, assume that each mean 
is either equal to or £ for some fixed £ > 0. Thus, 
the empirical distribution is Q = (1 — a)o~o + a5^, 
where 5 denotes a point mass and a is the frac- 
tion of nonzero £j's. The optimal weights are 1/a 
for hypotheses whose mean is £. To study the effect 
of misspecification error, consider the case where 
7m nulls are mistaken for alternatives with mean 
u > 0. This corresponds to misspecifying Q to be 
Q = (1 — a — 7)00 + jd~ u + aS^. We will study the ef- 
fect of varying u, so let ir(u) denote the power at 
the true alternative £ function of u. Also, let 
TTBonf denote the power using equal weights (Bon- 
ferroni). Note that changing Q = (1 — a)5o + ad^ to 
Q = (1 — a)5o + aS^i for £' 7^ £ does not change the 
weights. 

As the weights are a function of c, we first need 
to find c as a function of u. The normalization con- 
dition (4) reduces to 

. . — / u c\ —( £ c\ a 

(6, 7<f (_ + _j +(j4 (jL + _j = _ 

which implicitly defines the function c(u). First we 
consider what happens when u is restricted to be 
less than £. 

Theorem 3.7. Assume that a/m < 7 + a < I. 
Let Q = (1 — a)5o + a5^ and Q = (1 — a — 7)00 + 
"y5 u + ao~£ with < u < £. Let C(£) = sup 0<u< £ c(u) 
and define £ = z a /(m{-y+a)) '■ 

1. For £ < £ , C(£) = ££ - £ 2 /2- For £ > £ 0; C(£) 
is the solution to 

7 $(v / 2c) + a¥(| + ^ =-. 

\£ 2 J m 



In this 



case, C(0=zl /{mr) /2 + O(a). 
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Fig. 4. Power as a function of the alternative mean £. In these plots, a = 0.01, m = 1000 and a = 0.05. There are (1 — a)m 
nulls and ma alternatives with mean £. The left plots show what happens when the weights are incorrectly computed assuming 
that a fraction 7 of nulls are actually alternatives with mean u. In the top plot, we restrict < u < £. In the second and 
third plots, no restriction is placed on u. The top and middle plots have 7 = 0.1, while the third plot has 7 = 1 — a (all nulls 
misspecified as alternatives). The dark solid line shows the lowest possible power assuming the weights were estimated as poorly 
as possible. The lighter solid line is the power of the unweighted (Bonferroni) method. The dotted line is the power under the 
optimal weights. The vertical line in the top plot is at . The weighted method beats unweighted for all £ < £» . The right plot 
shows the least favorable u as a function of £. That is, mistaking 7m nulls for alternatives with mean u leads to the worst 
power. Also shown is the line u = £. 
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2. Let — z a j m + y ^ a j 1 
a)/(mj). For £ < £*, 



z 2 where q = a(l 



(7) 



inf 7r(«) > 7r Bo nf • 
0<u<£ 



For £ > £* we /iave 



(8) inf 7r(tt) > $ 

o<«<§ 



"a/ (017) 



2£* 



O(o) 



(9) 



(10) 



> 1 




0(a) 



The factor $(^2^(1 - aj/j) m 7/(1 - a) is the 
worst case power deficit due to misspecification. Now 
we drop the assumption that u < £. 

Theorem 3.8. Let Q = (1 — o)Sq + aS^ and let 
Q u = (1 — a — 7)00 + 7<5 U + ao£. iei 7r u denote the 
power at £ using the weights computed under Q u . 

1. The least favorable u is = arg min u> g ir u = 
y/2c* = z a /f m y) +0{a), where c* solves 



( | + j 



rri 



and c* 



a/im-y) 



/2 + 0(o). 



2. Tfte minimal power is 



inf tt u = $ = $ 



2£ 



+ 0(a). 



3. ^4 sufficient condition for inf u tt u to be larger than 
the power of the Bonferroni method is £ > z a / m + 

+ 0(a). 



z 2 -z 2 

a/m 0/(5717) 



a fixed fraction of hypotheses e that we want to give 
more weight to. The question is how to choose B. 
We will focus on choosing B to produce weights with 
good properties at interesting values of £. Now large 
values of £ already have high power. Very small val- 
ues of £ have extremely low power and benefit little 
by weighting. This leads us to focus on constructing 
weights that are useful for a marginal effect, defined 
as the alternative £0 that has power 1/2 when given 
weight 1. Thus, the marginal effect is £0 = z a / m . 
In the rest of this section then we assume that all 
nonzero £j's are equal to £o- Of course, the validity of 
the procedure does not depend on this assumption 
being true. 

Fix < e < 1 and vary B. As we increase B, we 
will eventually reach a point Bo(e) where R(B,e) < 
0, which we call the turnaround point. Formally, 
B (e) = sup{B:R(B,e) > 0}. The top panel in Fig- 
ure 5 shows Bq(s) versus e, which shows that for 
small e we can choose B large without loss of power. 
The bottom panel shows R{B, e) for e = 0.1. Ideally, 
for a given e, one chooses B near B*(e), the value 
of B that maximizes R{B,e). 

Theorem 4.1. Fix < e < 1. As a function of 
B, R{B,e) is unimodal and satisfies R(l,e) = 1, 
R'(l,e) > and R(oo,e) < 0. Hence, Bq(e) exists 
and is unique. Also, R(B,e) has a unique maximum 
at some point B*(e) and R(B*(e),e) > 0. 

When e is very small, B can be large, provided 
wo w 1. For example, suppose we want to increase 
the chance of rejecting one particular hypothesis so 
that e = 1/m. Then, 

mB „ 1 



B + m-1 



B, w 



B + m-l 



1 



4. CHOOSING EXTERNAL WEIGHTS 

One approach to choosing external weights (or 
test statistic cutoffs) is to use empirical Bayes meth- 
ods to model prior information while being careful 
to preserve error control as in Westfall and Soper 
(2001), for example. Here we consider a simple 
method that takes advantage of the robustness prop- 
erties we have discussed. We will focus here on the 
two- valued case. Thus, 

W = ( wi, . . . ,Wi , W , . . . ,W ) , 
k terms m—k terms 

where k = em, w\ = B/(eB + (1 — e)) and wo = 
1/ {eB + (1 — e)). In practice, we would typically have 



10 

B 



Fig. 5. Top plot: turnaround point Bo(e) versus e. Bottom 
plot shows the robustness function R(B, 0.1) versus B. The 
turnaround point Bo(e) is shown with a vertical dotted line. 
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and 



lim lim 7r(£j, w\) = 1, 

m->oo B-k» 



while lim lim ir(^j,Wo) = -. 

The next results show that binary weighting schemes 
are optimal in a certain sense. Suppose we want to 
have at least a fraction e with high power 1 — f3 and 
otherwise we want to maximize the minimum power. 

Theorem 4.2. Consider the following optimiza- 
tion problem: Given < e < 1 and < (3 < 1/2, find 
a vector w = (w±, . . . ,w m ) that maximizes miiij 7r(£ m , 
Wj) subject w = 1, and #{j : ir(wj, £ m ) > 1 — (3}/m > 
e. The solution is given by c = $(z a / m + Z\-$), B = 
cm(l — e)/(a — ecm), w\ = B/(eB + (1 — e)), wq = 
1 / (eB + (1 — e)) and k = em. 

If our goal is to maximize the number of alterna- 
tives with high power while maintaining a minimum 
power loss, the solution is given as follows. 

Theorem 4.3. Consider the following optimiza- 
tion problem: Given < (3 < 1/2, find a vector w = 
(wi, . . . , w m ) that maximizes #{j ■Tr{wj,£ m ) > 1 — 
[3} subject to w = 1, and mmj7r(wj,^ m ) > 5. The 
solution is 

w 1 = —<S>(z a/m + zi-p), w = —<S>(z a/m + z s ), 



a 
1 



w 



W\ — Wq 

and k = me. 

A special case that falls under this theorem per- 
mits the minimum power to be 0. In this case wq = 
and e = \jw\ . 

5. ESTIMATED WEIGHTS 

In practice, £j is not known, so it must be es- 
timated to utilize the weight function. A natural 
choice is to build on the two stage experimental 
design (Satagopan and Elston, 2003; Wang et al., 
2006) and split the data into subsets, using one sub- 
set to estimate and hence w(£i), and the sec- 
ond to conduct a weighted test of the hypothesis 
(Rubin, Dudoit and van der Laan, 2006). This ap- 
proach would arise naturally in an association test 
conducted in stages. It does lead to a gain in power 
relative to unweighted testing of stage 2 data; how- 
ever, it is not better than simply using the full data 



set without weights for the analysis (Rubin, Dudoit 
and van der Laan, 2006). These results are corrob- 
orated by Skol et al. (2006) in a related context. 
They showed that it is better to use stages 1 and 2 
jointly, rather than using stage 2 as an independent 
replication of stage 1. 

To gain a strong advantage with data-based weights, 
prior information is needed. One option is to order 
the tests (Rubin, Dudoit and van der Laan, 2006), 
but with a large number of tests this can be challeng- 
ing. The type of prior information readily available 
to investigators is often nonspecific. For instance, 
SNPs might naturally be grouped, based on features 
that make various candidates more promising for 
this disease under investigation. For a brain-disorder 
phenotype we might cross-classify SNPs by categor- 
ical variables such as functionality, brain expression 
and so forth. The SNPs in one group may seem most 
promising, a priori, while those in another seem least 
promising. Intermediate groups may be somewhat 
ambiguous. It is easy to imagine additional variables 
that further partition the SNPs into various classes 
that help to separate the more promising SNPs from 
the others. While this type of information lends it- 
self to grouping SNPs, it does not lead directly to 
weights for the groups. Indeed, it might not even be 
possible to choose a natural ordering of the groups. 
What is needed is a way to use the data to determine 
the weights, once the groups are formed. 

Until recently, methods for weighted multiple-testing 
required that prior weights be developed indepen- 
dently of the data under investigation (Genovese, 
Roeder and Wasserman, 2006; Roeder, Wasserman 
and Devlin, 2007). Here we provide a data-based es- 
timate of weights based on results of grouped analy- 
sis. One way to implement this approach is to follow 
these steps: 

1. Partition the tests into subsets Q\ , . . . , Qk , with 
the kth group containing r*. elements, ensuring that 
rfe is at least 20-30. 

2. Calculate the sample mean and variance 
for the test statistics in each group. 

3. Label the ith test in group k, T^. At best, only 
a fraction of the elements in each group will have a 
signal, hence, we assume that for i = 1, . . . the 
distribution of the test statistics is approximated by 
a mixture model 

T ik ~(l-TT k )N(0,l)+7T k N(t k ,l) 



or 



r tt ~(i-7T fc )x?(o) + 7r fc xi(6 
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where is the signal size for those tests with a sig- 
nal in the fcth group. This is an approximation be- 
cause the signal is likely to vary across tests. The 
mixture of normals is only appropriate when the 
tests are one-sided. For two-sided alternatives, the 
X 2 is the natural approach. This test squares the 
noncentrality parameter, effectively removing any 
ambiguity about the direction of the associations. 

4. Estimate (7Tfc,£fc) using the method of moments 
estimator (for details see the Appendix). Because 
£fc has no meaning when n k = 0, the is set to 
when 7Tfc is close to zero. For the normal model the 
estimators are 

(11) # k = Y*/(Y* + S%-l), £ k = Y k /n k , 

provided 7r k > l/r k ; otherwise £ k = 0. 
For the x 2 model they are 



(12) e 



(Sl + Y 2 + 3) 
Yu-1 



Y k -1 
si- 



provided Y k > 1 and l/r k < % k < (r k — l)/r k ; other- 
wise £fc = 0. 

5. For each of the k groups, construct weights 
w(£k)- It is apparent in Figure 1 that if \£ k \ < 5, 
for 5 near 0, then w(t; k ) ~ and it is unlikely that 
any tests in the kth. group will be significant, re- 
gardless of the p-value. The stochastic quantity 5 
depends upon the relative values of (£i, . . . ,£k), and 
the number of elements in each group. For this rea- 
son we have found that smoothing the weights gen- 
erally improves the power of the procedure. We sug- 
gest using a linear combination such as 



Wk = (1 -l)w{i k ) +lK 



with 7 = 0.01 or 0.05. The larger the choice of 7, the 
more evenly distributed the weights across groups. 
Alternatively, one could smooth the weights by using 
a Stein shrinkage estimator or bagging procedure 
to obtain a more robust estimator of (£1, ■ • ■ 
(Hastie, Tibshirani and Friedman, 2001). Regardless 
of how the weights are smoothed, one should renorm 
them to ensure the weights sum to m. Each test in 
group k receives the weight w k . Another effect of 
the smoothing is to ensure that each group gets a 
weight greater than 0. 

This weighting scheme relies on data-based esti- 
mators of the optimal weights, but with a partition 
of the data sufficiently crude to preserve the control 



of family-wise error rate. The approach is an exam- 
ple of the "sieve principle" (Bickel et al., 1993). The 
sieve principle works because the number of parame- 
ters estimated is far less than the number of observa- 
tions. Thus, many observations are used to estimate 
each parameter. Consequently, parameters are esti- 
mated with substantially less variability than if they 
were estimated using only the test statistics from 
the particular gene under investigation. Because the 
weights are determined by the size of the tests in the 
entire cluster, the probability of upweighting simply 
because a single test is large, due to chance, is small. 

6. EXAMPLES 

6.1 Binary Weights 

In a study of nicotine dependence, Saccone et al. 
(2007) used binary weights in a candidate gene study. 
Their study involved 3713 genetic variants (single 
nucleotide polymorphisms or SNPs) encompassing 
348 genes. The genes were divided into two types: 52 
nicoinic and dopaminergic receptor genes; and 296 
other candidate genes. Each SNP associated with a 
gene in the first group was allocated ten times the 
weight of a gene in the other category. Using a gen- 
erous false discovery rate (a = 0.4), they identified 
39 SNPs; 78% of these were nicotine receptors, in 
contrast to the fraction of nicotine receptors overall 
(15%). 

6.2 Independent Data Weights 

For family-based study designs, tests of associa- 
tion are based on transmission data. In these stud- 
ies, data are available from which one can compute 
the potential power to detect a signal at each SNP 
tested; see Ionita-Laza et al. (2007) for a detailed 
explanation of this unique feature of family-based 
data. Because the data used to calculate the power 
are independent of the test statistics for associa- 
tion, these data are available for construction of the 
weights. Motivated by this possibility, Ionita-Laza 
et al. (2007) developed a weighting scheme. Using 
independent data, they ranked the SNPs from most 
to least promising, in terms of power. They then con- 
structed an exponential weighting scheme, based on 
simulations of genetic models. The scheme results in 
a small number of SNPs receiving a top weight, suc- 
cessively more SNPs receiving correspondingly lower 
weights, and finally a large number receiving the 
lowest weight. In their simulations they found that 
the power of the test can often be doubled using 
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this procedure. Using the FHS data, they apply the 
technique to a genome-wide association study with 
116,204 SNPs and 923 participants. The phenotype 
of interest is height. Using their weighting scheme, 
they obtained one significant result with weights and 
none without weights. 

6.3 Linkage Weights 

Finding variation in the genetic code that increases 
the risk for complex diseases, such as Type II di- 
abetes and schizophrenia, is critically important to 
the advancement of genetic epidemiology. In the 
Introduction we describe a means by which weights 
could be extracted from linkage data. Here we illus- 
trate the idea with both data and simulations. 

In the analysis of 955 cases and 1498 controls 
enrolled in a genome-wide association study, Mc- 
Queen and colleagues (2008) used weights derived 
from published linkage results. They combined re- 
sults from 11 linkage studies on bipolar disorder to 
obtain Z scores corresponding to the locations of 
each association test. From the linkage results they 
computed weighted p-values using the cummulative 
normal weight function (Roeder et al., 2006). Al- 
though none of their results were genome-wide sig- 
nificant, they obtained promising results in four re- 
gions. Three of these are obtained due to strong p- 
values in combination with a linkage peak. One sig- 
nal did not correspond to a linkage peak, but con- 
tinued to be in the top tier of p- values, after weights 
were applied. 

To illustrate how binary weights could be derived 
from such linkage data, we present a realistic syn- 
thetic example. Using the methods described in 
Roeder et al. (2006), we create a linkage trace that 
captures many of the features found in actual linkage 
traces. In this simulation we generate a full genome 
(23 chromosomes) and place 20 disease variants at 
random, one per chromosome. The signals from these 
variants were designed to yield weak signals with 
broad peaks. Next, we simulated 100,000 normally 
distributed association test statistics mapped to the 
same genome. Again, 20 of these tests were gener- 
ated under the alternative hypothesis of association. 
These signals were also weak. 

To illustrate the synthetic data, six typical chro- 
mosomes are displayed in Figure 1. Each displayed 
chromosome has one true signal, with the associa- 
tion test statistic at that location indicated by an 
upspike; none of the association tests generated un- 
der the null hypothesis are plotted. Without weights, 



only 2 of the 20 signals could be detected using a 
Bonferonni correction. Using binary weights, as de- 
scribed above, with e = 0.05 and B = 10, we discover 
5 of the 20 signals. In the left column of the figure 
all three signals were discovered, while in the right 
column none were discovered (indicated by presence 
of a down-spike). Comparing the top row, we see 
that both signals were up-weighted in the correct 
location, but the association signal was not strong 
enough in the top right chromosome to achieve sig- 
nificance. Alternatively, in the bottom left panel the 
association statistic was substantial enough to re- 
ject the null hypothesis without the benefit of up- 
weighting. 

To examine the robustness of the procedure to 
choice of weights, we tried 4 choices of e (0.01, 0.05, 

0. 1, 0.2) with 1 < B < 50. We made no false dis- 
coveries with any of these choices. The power is dis- 
played in Figure 6. To assist in the choice of parame- 
ters, we have found it helpful to examine the number 
of discoveries for each choice. In this example, the 
number of discoveries varied between 2 (unweighted, 

1. e., B = 1) to 6 (e = 0.2, B > 10). Five discoveries 
were made for a broad range of choices. In principle, 
choosing (e, B) to maximize the number of discov- 
eries can inflate the error rate. In our simulations 
we have found that searching within the family of 
weights defined by 1 or 2 parameters, such as this 
binary weight system based upon a linkage trace, 
tends to provide very close to nominal protection 
against false discoveries. 

7. DISCUSSION 

Several authors have explored the effect of weights 
on the power of multiple testing procedures [e.g., 
Westfall, Kropf and Finos (2004)]. These investiga- 
tions show that the power of multiple testing proce- 
dures can be increased by using weighted p-values. 
Here we derive the optimal weights for a commonly 
used family of tests and show that the power is re- 
markably robust to misspecification of these weights. 

The same ideas used here can be applied to other 
testing methods to improve power. In particular, 
weights can be added to the FDR method, Holm's 
stepdown test, and the Donoho and Jin (2004) method. 
Weighting ideas can also be used for confidence in- 
tervals. Another open question is the connection with 
Bayesian methods which have already been devel- 
oped to some extent in Efron et al. (2001). 

GWAS for some phenotypes such as Type 1 di- 
abetes have yielded exciting results (Todd et al., 



MULTIPLE TESTING 



13 



o 

Q_ 



epsilon = 0.01 



10 



— i 

20 

Bv 

epsilon = 



I — 

30 



0.1 



40 



10 



~~ i — 

20 



30 



40 



~~ r 

50 



epsilon = 0.05 



~~ i — 

10 



i — 

20 



30 



40 



~~ r 

50 



Bv 

epsilon = 0.2 



I — 

10 



20 



30 



40 



50 



B 



Fig. 6. Power as a function of B and e. 



2007), while results for other complex diseases have 
been much less successful. Presumably, many stud- 
ies do not have sufficient power to detect the ge- 
netic variants associated with the phenotypes, even 
though thousands of cases and controls have been 
genotyped. To bolster power, we recommend up- 
weighting and down-weighting hypotheses, based on 
prior likelihood of association with the phenotype. 
For instance, Wang, Li and Bucan (2007) describe 
pathway-based approaches for the analysis of GWAS. 

Multiple testing arises in GWAS analyses in other 
contexts as well. Frequently, multiple tests, assum- 
ing different genetic models, are applied to each ge- 
netic marker. Multiple markers in a neighborhood 
can be analyzed simultaneously to increase the sig- 
nal, using haplotypes, multivariate models and fine- 
mapping techniques. Data are often collected in mul- 
tiple stages of the experiment, and at each stage 
promising markers are tested for association. In sum- 
mary, many questions concerning multiple testing 
remain open in the context of GWAS. 
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Proof of Lemma 2.1. The familywise error is 
Pj < — ^ for some j G Uq J 



< aw = a. 



: < 



ml m 



□ 



Proof of Lemma 2.2. The familywise error is 

ppn-H )>o) 

Pj < for some j € 7~L( 



m 



< 



j&Ho 



Wj = Wj 
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jeHo jen 



m a 
< < a. 

m 



□ 



Proof of Theorem 3.1. Let C denote the set 
of hypotheses with £j > 0. Power is optimized if Wj = 
for j (ji C. The average power is 

m i§*\ ( m ) 
with constraint 

E 

Choose u> to maximize 

1 { OtWj 



7T 



mi ' — ' \ \ m 



0) ~A(m- J^Ttfi 



by setting the derivative to zero 

9 7T - AH ^"V^jA")"^") Q - Q 
</>($ _1 (at(;j/m)) m 



dm 



mA 0($ (otWj /m) — £j) 
a 0(<J>~ 1 (awj/m)) 

The w; that solves these equations is given in (3). 
Finally, solve for c such that Wi = m. □ 

Proof of Theorem 3.4. The first statement 
follows easily by noting that the worst case corre- 
sponds to choosing weight B in the first term in 
-R(f) and choosing weight 6 in the second term in 
R((,). The rest follows by Taylor expanding Rb, B ((,) 
around 6 = 1. □ 

Proof of Lemma 3.5. With 6 = 0, R b)B (£) > 
when 

(13) Hz Ba/m - o - 2#(v™ - 6 > o. 

With 5 > 2, (13) holds at £ = 0. The left-hand side 
is increasing in £ for £ near 0, but (13) does not 
hold at £ = z a / m . So (13) must hold in the interval 
[0,&]. Rewrite (13) as $(z Ba/m -£) -$(z Q / m -£) > 
$(z a / m — f )• We lower bound the left-hand side and 
upper bound the right-hand side. The left-hand side 



IS <&{z Ba /m ~ ~ $(z a /m ~0 = L 



'a/m £ 



(u) du > 

{z a /m ~ z B a/m)4>{ z a/m ~ £)• The right-hand side can 
be bounded using Mill's ratio: $(z a / m — f) < 
4>{z a /m {z a /m ~ 0- Set the lower bound greater 



than the upper bound to obtain the stated result. 
□ 



PROOF of Lemma 3.6. Choose K > 1 such that 
1/(K + 1) < l/a - e. Choose 1 > 7 > (2q - a )/K. 
Choose a small c > 0. Let ^ = A + \/A 2 — 2c and 



u 



B 



A = § 



2c, where 
1 



a 



B 



¥ _1 



(m(7X + a)) 
(mfrlf + a)) 



Then p(f) = l/a and /5(f) = \/{K + 1). Now d(Q, 
Q) = 7. Taking i'T sufficiently large and 7 sufficiently 
close to (2a — a) /K makes 7 < <5. □ 

It is convenient to prove Theorem 3.8 before prov- 
ing Theorem 3.7. 



Proof of Theorem 3.8. 



(14) 



Let c* solve 



a 
m 



We claim first that, for any c > c*, there is no u 
such that the weights average to 1. Fix c > c*. The 
weights average to 1 if and only if 



(15) 



7$ - + - 
\m 2 



— / £ c \ a 
+ a$( J + - =-. 
2 f / m 



Since c > c* and since the second term is decreasing 
in c, we must have 



s: c u 
$( _ + 

u 2 



>$fv / 2c: 



The function r(tt) = <&{c/u + u/2) is maximized at 
u = \/2^_ So r(^2c) > r(«). But r( v / 2c) = ~$(y/2c). 
Hence, $(v2c) > r(tt) > <&(\/2c*). This implies c < 
c*, which is a contradiction. This establishes that 
sup u c(u) < c*. On the other hand, taking c = c* and 
u = yjlc* solves equation (15). Thus, c* is indeed the 
largest c that solves the equation which establishes 
the first claim. The second claim follows by noting 
that 



7$(V2£) + 0(a). 



Now set this expression equal to a/m and solve. □ 

Proof of Theorem 3.7. Define c* as in (14). 
If = \j2cl < f , then the the proof proceeds as 
in the previous proof. So we first need to estab- 
lish for which values of f is this true. Let r(c) = 
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7$(\/2c) + a$(£/2 + c/£). We want to find out when 
the solution of r(c) = a/m is such that \/2c < £, 
or, equivalently, c < £ 2 /2. Now r is decreasing in c. 
Since 7 + a > a/m, r(— 00) > a/m. Hence, there is a 
solution with c < £ 2 /2 if and only if r(^ 2 /2) < a/m. 
But r(£ 2 /2) = (7+a)$(^), so we conclude that there 
is such a solution if and only if (7 + a) <!>(£) < a/m, 
that is, £ > 2 a /( m ( 7 + )) = Co- 

Now suppose that £ < £o- We need to find u < 
£ to make c as large as possible in the equation 
v(u, c) = 7$(u/2 + c/u) + a¥(£/2 + c/f ) = a/m. Let 
u* = £ and c* = £z a /(m( 7 +a)) ~~ £ 2 /2. By direct sub- 
stitution, i>(tt*,c*) = a/m for this choice of it and 
c and, clearly, u* < £ as required. We claim that 
this is the largest possible c*. To see this, note that 
v(u, c) < t>(ii, c*). For £ <£o ; c*) is a decreasing 
function of u. Hence, v(u,c) < v(u, c*) < ?;(«*, c*) = 
a/m. This contradicts the fact that v(u,c) = a/m. 

For the second claim, note that the power of the 
weighted test beats the power of Bonferroni if and 
only if the weight w = (m/a)¥(£/2 + C(£)/2) > 1, 
which is equivalent to 

(i6) c(o < iz a/m - e/2. 

When £ < £ , C(f) = ££ - £ 2 / 2 - % assumption, 
7 + a < 1 so that z a /( m ( 7 + )) < £ Q /m an d now sup- 
pose that £0 < £ < £*■ Then C(£) is the solution to 
r(c) = 7$(v / 2c) + a¥(£/2 + c/£) = a/m. We claim 
that (16) still holds. Suppose not. Then, since r(c) is 
decreasing in c, r{^z a / m — £ 2 /2) > r(C(£)) = a/m. 
But, by direct calculation, r(£z a / m — £ 2 /2) > a/m 
implies that £ > £*, which is a contradiction. Thus, 
(7) holds. 

Finally, we turn to (8). In this case, C(£) = ^/(m-y)/ 
2_-fO(a). The worst case power is ¥(C(£)/£-£/2) = 
$ (*a/(m 7 )/( 2 £) - ^/ 2 ) + The latter is increas- 

ing in £ and so is at least $(2 2 /(m 7 )/( 2 £*) ~ £*/ 2 ) + 
O(a) = ¥((z 2 /(m7) /(2£,) - £ 2 )/(2£*)) + 0(a), as 
claimed. The next two equations follow from stan- 
dard tail approximations for Gaussians. Specifically, 
a Gaussian quantile zp / m can be written as zp i m = 

y/2 log (mL m //3), where L m = c log a (m) for constants 
a and c [Donoho and Jin (2004)]. Inserting this into 
the previous expression yields the final expression. 
□ 

Proof of Theorem 4.2. Setting 7r(u;,£ m ) = 

<&(<!> (wa/m) — £ m ) equal to 1 — /3 implies w = 
{m/ a)<&{z\-p + z a / m ), which is equal to w\ as stated 
in the theorem. The stated form of wq implies that 



the weights average to 1. The stated solution thus 
satisfies the restriction that a fraction e have power 
at least 1 — (3. Increasing the weight of any hypoth- 
esis whose weight is wq necessitates reducing the 
weight of another hypothesis. This either reduces 
the minimum power or forces a hypothesis with power 
1 — /3 to fall below 1 — (3. Hence, the stated solution 
does in fact maximize the minimum power. □ 
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